Goto

Collaborating Authors

 empirical fisher approximation



Limitations of the empirical Fisher approximation for natural gradient descent

Neural Information Processing Systems

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.



Reviews: Limitations of the empirical Fisher approximation for natural gradient descent

Neural Information Processing Systems

Originality: the paper lacks a sound and novel contribution. Theoretically, there is only one minor result as stated above. Technically, there is not a systematical experimental study on real deep networks. The main contribution is on discussing two different formulations of the Fisher matrix. The main trick on making these two formulations different (despite that the authors took a sophisticated approach going though GGN) is that the so called empirical Fisher relies on y_n (target of neural network output), and if one consider y_n to be randomly distributed with fixed variance based on the neural network output, the two formulations are equivalent, otherwise there is a scale parameter in eq.(3) which is shrinking making the two formulations different because of the shrinking and damping.


Reviews: Limitations of the empirical Fisher approximation for natural gradient descent

Neural Information Processing Systems

All reviewers were positive about the paper. The paper corrects several common incorrect assertions and misleading derivations in the natural gradient algorithms literature. The exposition is remarkably clear, with a potential to serve as a reference paper on the topic. The paper is clearly of broad interest to the machine learning community. We recommend to take the reviewers' comments and suggestions into account while preparing the camera ready final version of the paper.


Limitations of the empirical Fisher approximation for natural gradient descent

Neural Information Processing Systems

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.


Limitations of the empirical Fisher approximation for natural gradient descent

Kunstner, Frederik, Hennig, Philipp, Balles, Lukas

Neural Information Processing Systems

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects. Papers published at the Neural Information Processing Systems Conference.


Limitations of the Empirical Fisher Approximation

Kunstner, Frederik, Balles, Lukas, Hennig, Philipp

arXiv.org Machine Learning

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.